High-Performance Haplotype Assembly

نویسندگان

  • Marco Aldinucci
  • Andrea Bracciali
  • Tobias Marschall
  • Murray Patterson
  • Nadia Pisanti
  • Massimo Torquati
چکیده

The problem of Haplotype Assembly is an essential step in human genome analysis. It is typically formalised as the Minimum Error Correction (MEC) problem which is NP-hard. MEC has been approached using heuristics, integer linear programming, and fixedparameter tractability (FPT), including approaches whose runtime is exponential in the length of the DNA fragments obtained by the sequencing process. Technological improvements are currently increasing fragment length, which drastically elevates computational costs for such methods. We present pWhatsHap, a multi-core parallelisation of WhatsHap, a recent FPT optimal approach to MEC. WhatsHap moves complexity from fragment length to fragment overlap and is hence of particular interest when considering sequencing technology’s current trends. pWhatsHap further improves the efficiency in solving the MEC problem, as shown by experiments performed on datasets with high coverage.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Self-organizing map approaches for the haplotype assembly problem

Haplotype assembly is to reconstruct a pair of haplotypes from SNP values observed in a set of individual DNA fragments. In this paper, we focus on studying minimum error correction (MEC) model for the haplotype assembly problem and explore self-organizing map (SOM) methods for this problem. Specifically, haplotype assembly by MEC is formulated into an integer linear programming model. Since th...

متن کامل

Tumor Haplotype Assembly Algorithms for Cancer Genomics

The growing availability of inexpensive high-throughput sequence data is enabling researchers to sequence tumor populations within a single individual at high coverage. But, cancer genome sequence evolution and mutational phenomena like driver mutations and gene fusions are difficult to investigate without first reconstructing tumor haplotype sequences. Haplotype assembly of single individual t...

متن کامل

Haplotype assembly in polyploid genomes and identical by descent shared tracts

MOTIVATION Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing high-throughput sequencing data must scale favorably in terms of both accuracy and comp...

متن کامل

HapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data

Genome assembly methods produce haplotype phase ambiguous assemblies due to limitations in current sequencing technologies. Determining the haplotype phase of an individual is computationally challenging and experimentally expensive. However, haplotype phase information is crucial in many bioinformatics workflows such as genetic association studies and genomic imputation. Current computational ...

متن کامل

Improving the performance measurement using overall equipment effectiveness in an automotive industry

Considering the present business competitive scenario, the automotive industry is under pressure to achieve higher productivity. A high level of performance and quality standard could be achieved through improving the Overall Equipment Effectiveness (OEE) of the equipment in an automotive industry. Thus, the aim of this study is to investigate the performance measurement through OEE theory in a...

متن کامل

HapCHAT: Adaptive haplotype assembly for efficiently leveraging high coverage in long reads

Motivation: Haplotype assembly is the process of reconstructing the haplotypes of an individual from sequencing reads. Computational methods for this problem have shown to achieve high accuracy on long reads, which are becoming cheaper to produce and more widely available. Larger amounts of data, usually originating from increased coverage, are highly beneficial for improving the quality of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014